Goto

Collaborating Authors

 social group


55d491cf951b1b920900684d71419282-Supplemental.pdf

Neural Information Processing Systems

Now, one could try to translate the constraint as|xk x0k| α for all k = 1,...,r. Here, we investigate the impact of the balance parameterγ from Equation (5) on the accuracyfairnesstradeoff. Note thatourmethod canincrease bothaccuracy, albeit only by a small amount, and fairness for certain values ofγ (e.g., γ = 2). Across all datasets and values ofγ the largest increase in certification for adversarial training is roughly 7%, with a simultaneous accuracy drop of 0.5%, and the largest accuracy drop is roughly 1%, with a simultaneous increaseincertification of2.9%. Wenote that although Fischer et al.[15]support terms with real-valued functions, we only consider linear functions since nonlinear constraints, e.g.,x2 < 3, cannot be encoded exactly as MILP.


Discrimination in Online Markets: Effects of Social Bias on Learning from Reviews and Policy Design

Neural Information Processing Systems

The increasing popularity of online two-sided markets such as ride-sharing, accommodation and freelance labor platforms, goes hand in hand with new socioeconomic challenges. One major issue remains the existence of bias and discrimination against certain social groups. We study this problem using a two-sided large market model with employers and workers mediated by a platform. Employers who seek to hire workers face uncertainty about a candidate worker's skill level. Therefore, they base their hiring decision on learning from past reviews about an individual worker as well as on their (possibly misspecified) prior beliefs about the ability level of the social group the worker belongs to. Drawing upon the social learning literature with bounded rationality and limited information, uncertainty combined with social bias leads to unequal hiring opportunities between workers of different social groups. Although the effect of social bias decreases as the number of reviews increases (consistent with empirical findings), minority workers still receive lower expected payoffs. Finally, we consider a simple directed matching policy (DM), which combines learning and matching to make better matching decisions for minority workers. Under this policy, there exists a steady-state equilibrium, in which DM reduces the discrimination gap.


Counterfactually Fair Representation

Neural Information Processing Systems

The use of machine learning models in high-stake applications (e.g., healthcare, lending, college admission) has raised growing concerns due to potential biases against protected social groups. Various fairness notions and methods have been proposed to mitigate such biases. In this work, we focus on Counterfactual Fairness (CF), a fairness notion that is dependent on an underlying causal graph and first proposed by Kusner $\textit{et al.}$; it requires that the outcome an individual perceives is the same in the real world as it would be in a counterfactual world, in which the individual belongs to another social group. Learning fair models satisfying CF can be challenging. It was shown in (Kusner $\textit{et al.}$) that a sufficient condition for satisfying CF is to $\textbf{not}$ use features that are descendants of sensitive attributes in the causal graph. This implies a simple method that learns CF models only using non-descendants of sensitive attributes while eliminating all descendants. Although several subsequent works proposed methods that use all features for training CF models, there is no theoretical guarantee that they can satisfy CF. In contrast, this work proposes a new algorithm that trains models using all the available features. We theoretically and empirically show that models trained with this method can satisfy CF.


Discrimination in Online Markets: Effects of Social Bias on Learning from Reviews and Policy Design

Faidra Georgia Monachou, Itai Ashlagi

Neural Information Processing Systems

The increasing popularity of online two-sided markets such as ride-sharing, accommodation and freelance labor platforms, goes hand in hand with new socioeconomic challenges. One major issue remains the existence of bias and discrimination against certain social groups.


BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models

Kardkovacs, Zsolt T., Djennane, Lynda, Field, Anna, Benatallah, Boualem, Gaci, Yacine, Casati, Fabio, Gaaloul, Walid

arXiv.org Artificial Intelligence

Sentiment Analysis (SA) models harbor inherent social biases that can be harmful in real-world applications. These biases are identified by examining the output of SA models for sentences that only vary in the identity groups of the subjects. Constructing natural, linguistically rich, relevant, and diverse sets of sentences that provide sufficient coverage over the domain is expensive, especially when addressing a wide range of biases: it requires domain experts and/or crowd-sourcing. In this paper, we present a novel bias testing framework, BTC-SAM, which generates high-quality test cases for bias testing in SA models with minimal specification using Large Language Models (LLMs) for the controllable generation of test sentences. Our experiments show that relying on LLMs can provide high linguistic variation and diversity in the test sentences, thereby offering better test coverage compared to base prompting methods even for previously unseen biases.


The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems

Nguyen, Thi-Nhung, Luo, Linhao, Vu, Thuy-Trang, Phung, Dinh

arXiv.org Artificial Intelligence

Bias in large language models (LLMs) remains a persistent challenge, manifesting in stereotyping and unfair treatment across social groups. While prior research has primarily focused on individual models, the rise of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and largely unexplored dynamics in bias emergence and propagation. In this work, we present a comprehensive study of stereotypical bias in MAS, examining how internal specialization, underlying LLMs and inter-agent communication protocols influence bias robustness, propagation, and amplification. We simulate social contexts where agents represent different social groups and evaluate system behavior under various interaction and adversarial scenarios. Experiments on three bias benchmarks reveal that MAS are generally less robust than single-agent systems, with bias often emerging early through in-group favoritism. However, cooperative and debate-based communication can mitigate bias amplification, while more robust underlying LLMs improve overall system stability. Our findings highlight critical factors shaping fairness and resilience in multi-agent LLM systems.


CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs

Nikeghbal, Nafiseh, Kargaran, Amir Hossein, Diesner, Jana

arXiv.org Artificial Intelligence

Improvements in model construction, including fortified safety guardrails, allow Large language models (LLMs) to increasingly pass standard safety checks. However, LLMs sometimes slip into revealing harmful behavior, such as expressing racist viewpoints, during conversations. To analyze this systematically, we introduce CoBia, a suite of lightweight adversarial attacks that allow us to refine the scope of conditions under which LLMs depart from normative or ethical behavior in conversations. CoBia creates a constructed conversation where the model utters a biased claim about a social group. We then evaluate whether the model can recover from the fabricated bias claim and reject biased follow-up questions. We evaluate 11 open-source as well as proprietary LLMs for their outputs related to six socio-demographic categories that are relevant to individual safety and fair treatment, i.e., gender, race, religion, nationality, sex orientation, and others. Our evaluation is based on established LLM-based bias metrics, and we compare the results against human judgments to scope out the LLMs' reliability and alignment. The results suggest that purposefully constructed conversations reliably reveal bias amplification and that LLMs often fail to reject biased follow-up questions during dialogue. This form of stress-testing highlights deeply embedded biases that can be surfaced through interaction. Code and artifacts are available at https://github.com/nafisenik/CoBia.


Discrimination in Online Markets: Effects of Social Bias on Learning from Reviews and Policy Design

Faidra Georgia Monachou, Itai Ashlagi

Neural Information Processing Systems

The increasing popularity of online two-sided markets such as ride-sharing, accommodation and freelance labor platforms, goes hand in hand with new socioeconomic challenges. One major issue remains the existence of bias and discrimination against certain social groups.


A taxonomy of epistemic injustice in the context of AI and the case for generative hermeneutical erasure

Mollema, Warmhold Jan Thomas

arXiv.org Artificial Intelligence

Epistemic injustice related to AI is a growing concern. In relation to machine learning models, epistemic injustice can have a diverse range of sources, ranging from epistemic opacity, the discriminatory automation of testimonial prejudice, and the distortion of human beliefs via generative AI's hallucinations to the exclusion of the global South in global AI governance, the execution of bureaucratic violence via algorithmic systems, and interactions with conversational artificial agents. Based on a proposed general taxonomy of epistemic injustice, this paper first sketches a taxonomy of the types of epistemic injustice in the context of AI, relying on the work of scholars from the fields of philosophy of technology, political philosophy and social epistemology. Secondly, an additional conceptualization on epistemic injustice in the context of AI is provided: generative hermeneutical erasure. I argue that this injustice the automation of 'epistemicide', the injustice done to epistemic agents in their capacity for collective sense-making through the suppression of difference in epistemology and conceptualization by LLMs. AI systems' 'view from nowhere' epistemically inferiorizes non-Western epistemologies and thereby contributes to the erosion of their epistemic particulars, gradually contributing to hermeneutical erasure. This work's relevance lies in proposal of a taxonomy that allows epistemic injustices to be mapped in the AI domain and the proposal of a novel form of AI-related epistemic injustice.


EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering

Ruiz-Fernández, Valle, Mina, Mario, Falcão, Júlia, Vasquez-Reina, Luis, Sallés, Anna, Gonzalez-Agirre, Aitor, Perez-de-Viñaspre, Olatz

arXiv.org Artificial Intelligence

Previous literature has largely shown that Large Language Models (LLMs) perpetuate social biases learnt from their pre-training data. Given the notable lack of resources for social bias evaluation in languages other than English, and for social contexts outside of the United States, this paper introduces the Spanish and the Catalan Bias Benchmarks for Question Answering (EsBBQ and CaBBQ). Based on the original BBQ, these two parallel datasets are designed to assess social bias across 10 categories using a multiple-choice QA setting, now adapted to the Spanish and Catalan languages and to the social context of Spain. We report evaluation results on different LLMs, factoring in model family, size and variant. Our results show that models tend to fail to choose the correct answer in ambiguous scenarios, and that high QA accuracy often correlates with greater reliance on social biases.